04. Catch Sample

Catch Sample

The catch sample is a simple C/C++ program which links to the reinforcement learning library provided in the repository.

The environment is a 2-dimensional screen. A ball drops from the top of the screen and the agent is supposed to “catch” the ball before it hits the bottom of the screen. Its only allowed actions are left, right or none.

### catch Implementation

The catch.cpp code main procedure consists of the following sections:

  • Initialize and instantiate a dqnAgent
  • Allocate memory for the game
  • Set up the game state (ball location)
  • Game loop
    • Update the game state
    • Get the agent’s next action with NextAction()
    • Apply the action to the game
    • Compute the reward
    • Exit if game over

The dqnAgent is initialized with parameters defined at the start of the module:

    // Create reinforcement learner agent in pyTorch using API
    dqnAgent* agent = dqnAgent::Create(gameWidth, gameHeight, 
                       NUM_CHANNELS, NUM_ACTIONS, OPTIMIZER, 
                       LEARNING_RATE, REPLAY_MEMORY, BATCH_SIZE, 
                       GAMMA, EPS_START, EPS_END, EPS_DECAY,
                       USE_LSTM, LSTM_SIZE, ALLOW_RANDOM, DEBUG_DQN);

## Quiz - Catch Rewards

As with the OpenAI Gym environments, the catch game environment must provide rewards to the agent based on the action the agent chooses. The reward function snippet from catch.cpp can be found in the main game loop, and is pasted below.

// Compute reward
        float reward = 0.0f;

        if( currDist == 0 )
            reward = 1.0f;
        else if( currDist > prevDist )
            reward = -1.0f;
        else if( currDist < prevDist )
            reward = 1.0f;
        else if( currDist == prevDist )
            reward = 0.0f;

The variable currDist is the current distance to the ball and the variable prevDist is the distance to the ball found in the previous frame. Use the snippet to answer the following question:

What rewards are provided by the catch environment to the DQN agent? Check all correct boxes.

SOLUTION:
  • 0 if the ball is not getting closer or farther away
  • +1 if the ball is getting closer
  • -1 if the ball is getting farther away
  • +1 if the ball is caught

## Running catch

To test the textual catch sample, begin by opening the Udacity Workspace in a new window by clicking on this link.

Then, follow these steps:

  • When asked if you'd like to Enable GPU Mode, select [YES].
  • Click on the [Go to Desktop] button in the bottom right corner of the Workspace; this will open a new window.
  • If you get an error message that says something like No session for pid 55, click [OK] to close the window.

Next, open a terminal by clicking the Terminator icon on the desktop. Navigate to the folder containing the samples by typing the following in a terminal window:

cd /home/workspace/jetson-reinforcement/build/x86_64/bin

Then, run the catch executable from the terminal:

$ ./catch 

The terminal will list the initialization values, then print out results for each iteration. After around 100 episodes or so, the agent should start winning the episodes nearly 100% of the time. The following is an example output:

[deepRL]  input_width:    64
[deepRL]  input_height:   64
[deepRL]  input_channels: 1
...
WON! episode 1
001 for 001  (1.0000)  
WON! episode 5
004 for 005  (0.8000)  
...
WON! episode 110
078 for 110  (0.7091)  19 of last 20  (0.95)  (max=0.95)
WON! episode 111
079 for 111  (0.7117)  19 of last 20  (0.95)  (max=0.95)
WON! episode 112
080 for 112  (0.7143)  20 of last 20  (1.00)  (max=1.00)

Internally, catch is using the dqnAgent API from our C++ library to implement the learning.

## Alternate Arguments

There are some optional command line parameters to catch that you can play around with, to change the dimensions of the environment and pixel array input size, increasing the complexity to see how it impacts convergence and training times:

$ ./catch --width=96 --height=96
$ ./catch --render  # enable text output of the environment

With 96x96 environment size, the catch agent achieves >75% accuracy after around 150-200 episodes.
With 128x128 environment size, the catch agent achieves >75% accuracy after around 325 episodes.